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ABSTRACT 


In the recent decades, several methods have been developed to extract 
moving objects in the presence of dynamic background. However, most of 
them use a global threshold, and ignore the correlation between neighboring 
pixels. To address these issues, this paper presents a new approach to 
generate a probability image based on Kernel Density Estimation (KDE) 


method, and then apply the Maximum A Posteriori in the Markov Random 
Field (MAP-MRF) based on probability image, so as to generate an energy 
function, this function will be minimized by the binary graph cut algorithm to 
detect the moving pixels instead of applying a_ thresholding step. 
The proposed method was tested on various video sequences, and the 
obtained results showed its effectiveness in presence of a dynamic scene, 
compared to other background subtraction models. 
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1, INTRODUCTION 

The separation of moving pixels from their background is an essential phase in many computer 
vision fields, especially, in video surveillance, traffic monitoring, activity recognition, etc. The main idea is 
to create a statistical model of the background, aiming to generate a representation of a background image 
based on previous frames by using density functions [1], either on each pixel or by regions. Then, this 
representation is compared against the input frame to get a binary mask image which represents the position 
of the moving objects. Yet, the background is not always static in general, so the model must be robust and 
more adaptive for the purpose to overcome some frequent issues successfully, such as gradual or sudden 
illumination change, non stationary background [2]. 

Different background modeling techniques have been proposed to address the previous limitations 
[2]. The tradictional ones are based on pixel intensity, which exploit only the intensity value to decide if a 
pixel belongs to the background or the moving objects. Despite their promising performances, they generate 
some misclassified pixels, especially if the background and the foreground have the same color, and because 
they also ignore the spatial dependencies of neighboring pixels. While, models based on texture features [3] 
have demonstrated a certain degree of success in exploiting the spatial correlation, they consider 
discriminative texture measure as features to distinguish moving pixels from the background. Although, they 
still have some shortcomings like the use of a threshold to detect the moving pixels. Recently, several 
methods based on deep learning have appeared [4]. Which aim at handling all above limitations, However, 
they require a training phase with several annotated examples, that needs more computational time. 

In order to tackle some of these issues, firstly, we generate a probability image using KDE method, 
and then, instead of using a threshold to segment this image into foreground and background, this binary 
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segmentation of the probability image is performed by minimizing an energy function using the graph cut 
algorithm, which exploits the spatial correlation of neighboring pixels. 

The remainder of this paper is organized as follows: The second section presents theoretical 
reviews, the third section shows research method, in which we present how to construct probability image 
using KDE, and describe our moving object detection method using a graph cut algorithm; the fourth section 
discusses the experiments results and finally the conclusion. 


2. THEORETICAL REVIEW 

In order to detect moving objects using intensity pixel value, Stauffer et al. [5] have proposed to 
model each pixel with a mixture of K Gaussians distributions. This model is used to determine whether a 
pixel is belongs to the background or the foreground by comparing it with each Gaussian distribution, 
the initialization of the Gaussians distribution is made by using an Expectation Maximization algorithm, 
however, the number of Gaussians distributions and the parameters of each distribution must be initialized. 
Zivkovic in [6] improves upon the widely used parametric Gaussian Mixture Model system by introducing an 
on-line clustering algorithm to separate foreground clusters from background ones. In order to exploit the 
spatial information and deal with highly dynamic scenes, Haiying et. al. [7] have proposed a modified 
Gaussian mixture background model based on the spatial-temporal distribution which uses time and space 
distribution information. While [8] proposes an effective scheme for modelling and updating a background 
adaptively in dynamic scenes focus on statistical learning, and [9], proposes a new method as an adaptation 
of the MOG approach for detecting the moving objects, where the foreground is extracted by considering the 
HSV color space. 

To overcome the parameters initialization of GMM, a non-parametric approach called Kernel 
Density Estimation (KDE) that can effectively adapt to a dynamic background was proposed [10]. However, 
KDE has to keep N frames in memory which is time consuming when N is large, Park et al. [11]. Used the 
Bayesian rule with the KDE method and applied histogram approximation to decrease the computational 
cost. Whereas, [12] follows a nonparametric background modeling paradigm, in which each location in a 
dynamic scene collects a set of samples on different spatial scales which occurred in the past time and in the 
neighborhood. 

On the other hand, Kim et al. [13] have proposed a codebook method which uses a clustering 
technique to model the background and initializes codewords of codebooks to store background states, where 
codewords are a series of key color values. Next, Shah et al. [14] designed a Self-Adaptive CodeBook 
(SACB) background model. This model is a block-based structure using the close proximity of local 
neighborhoods. In this method, the exponential smoothing filter is adapted to keep the mean and variance 
values in order to automatically estimate the brightness boundary threshold and discolorations for each code 
word. Whereas [15], presents a novel dynamic codebook method to address such challenges. The dynamic 
codebook aims to significantly improve the conventional well-known codebook technique by introducing a 
technique to make a dynamic boundary of each codeword 

In order to robustly deal with varying illumination conditions, the use of textures have been 
proposed on a block-wise processing approach and extended to the pixel level by Heikkila [16], 
these methods use discriminative texture features to capture background statistics. These features are 
computed by using a Local Binary Pattern (LBP) in order to consider dynamic textures, the use of a Volume 
Local Binary Pattern (VLBP) operator is proposed in [17], which consists of concatenated LBP histograms 
from three orthogonal planes. So as to combine spatial and temporal informations, [18] have proposed a 
Spatio-Temporal Local Binary Pattern (STLBP) operator, consisting in a weighted sum of two consecutive 
LBP histograms, to alleviate the computational cost imposed by VLBP, LBP histograms provides a robust 
manner to cope with illumination changes in dynamic scenes. Nevertheless, they do not provide a principled 
manner to evaluate the distance of new observations to the background models. [19] Have presented a 
modified Local Binary Similarity Pattern (LBSP) descriptor to set up the background model in feature space, 
it calculated the LBSP descriptor by absolute difference which is different from LBP. 

Deep leaning, especially convolutional neural networks (CNNs) have recently been very popular 
and have been used successfully in moving object detection. The reference [20] proposes two robust encoder- 
decoder type neural networks that generate multi-scale feature encodings in different ways and can be trained 
end-to-end using only a few training samples. Where Wang and al. [21] have proposed a highly accurate 
semi-automatic method for segmenting foreground moving objects pictured in surveillance videos. 
They implement a end-to-end model based on a multi-resolution convolutional neural network (CNN) with a 
cascaded architecture. This one does not need a large number of examples to accurately fit the data. With the 
aim of addressing the complex nature of the dynamic scene in real surveillance task, the authors [22] have 
presented a simple and efficient vector-based method is proposed to address real surveillance challenges, 
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where the concept of linear dependence of vectors is used to build background models corresponding to each 
pixel, wehre linear independence is used to detect moving object from incoming video sequel. Babaee et al. 
[23], propose a new approach to estimate background model from video sequences, where feature 
engineering and parameter tuning become unnecessary since the network parameters can be learned from 
data by training a single CNN that can handle various video scenes. For the training of the CNN, 
they employed randomly 5% video frames and their ground truth segmentations taken from the Change 
Detection challenge 2014 (CDnet 2014), and also used spatial-median filtering as the post-processing of the 
network outputs. 


3. RESEARCH METHOD 
a) Probability Image using Improved KDE 

Kernel density estimation [10] is a non-parametric approach which estimates the probability density 
of a pixel using a sample of data. The calculation of the probability image using KDE is performed through a 
three steps, which will be explained in the following subsections. 


3.1.1. Probability Density of a Pixel using KDE Model 
Considering {x7,X2, ...,Xn/ a sample of a pixel values from previous frames, in general, the estimation 
of probability density for pixel value x; at time t, is given as: 


Pe ese es) (1) 
t > dao t i 
Where n denotes the number of samples o represents the kernel function bandwidth, and k(x) is a kernel 


function which should satisfy these three conditions: 


1) K(x) >=0 
2) | K(x)dx=1 
3) [xK(x)dx =0 should be symmetric 


(2) 


There are several kernel functions, which satisfy the above conditions, such as Gaussian kernel, 
Epanechnikov, triangular and uniform kernel, in this paper; we choose the Gaussian one as kernel function, 
which can be described as: 








exp(— = ) 


K(x)= aro IoG2 (3) 


So using (3), the (1) becomes: 


1 n (x-—x.)? 
P(x, ) = — = Dexp(-———) 
(i) nN 2207 i=! 207 (4) 


Where o is estimated using the median or the average of | x- x,,| for each consecutive pair (x,x_)in the 


sample of pixel values [10]: 


5 a 
0.68/2 
Se (5) 


m= ——— or m=median( |x; — X,44|) 
n i=l...n—1 


Each Gaussian kernel describes just one sample data, and o is the same for all kernels of the 
same pixel. 
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3.1.2. Updating Background Model 

In reality, the background is never completely static, but changes over time. So, it 1s necessary to 
adapt the background model to those changes, by updating the sample of pixel values in FIFO (First In First 
Out) order as follows: 
Let x; denotes the pixel value in the sample, and x;1is the current pixel value. 

If x; is the value of background pixel then: x, = a@x,+ (1- a) : 


Where q@ 1s an empirical weight often chosen as a tradeoff between stability and quick update [10] 

In general, probability density reflects the variation of the pixel value, so, if this latter changes 
frequently during a period of time, that means its probability value will be small, and there is more possibility 
to be moving object pixel. By contrast, if the pixel keeps the same value, or changes a little, then it has a 
large chance to be a background pixel. 


3.1.3. Probability Image 
After calculating the probability density p(x, ) using (1), each pixel value in probability image is 


ted follows: = 
computed as follows: ctsex P(x, )(ctse 255) The value of ctse is set to 255, in order to transform 


probability value into a grayscale value. 

Figure 1(b) shows an example of probability image, where the dark zone represents pixels with 
lower probability (moving object), and the white zone represents pixel with higher probability (background 
pixel), so the problem consists of separating the moving object from the background. KDE model uses a 
global threshold to segment the moving object from the background, and ignore the correlation that exists 
between the intensity of neighboring pixels. To address these limits, we exploit the performance of the graph 
cut algorithm to extract moving pixels from its background. 





(a) (b) 





Figure 1. (a) Original image, (b): Estimated probability image 


3.2. Moving Object Detection based on Graph Cut 

The moving object detection problem can be viewed as a binary labeling task, label (1) for object 
pixels and (QO) for background pixels or vice verse. The use of threshold to detects moving objects, 
is considered as a difficult task as the Figure 2 demonstrates, where the use of a lower threshold in (b) 
generate more false positive pixels, and the use of a high threshold eliminates true positive pixels, so to 
overcome this issue, we segment the pixels into background/moving pixel, through the minimization of an 
energy function using graph-cut. 


3.2.1. Energy Function 
Considered X = /x,,x,,.....,x, /an observation set of a current probability image I in a video 


sequence, our objective is to assign a label from the binary set Y =/ y, | y, € 0,1 \ / to each pixel in I, which is 
equivalent to maximize the posterior probability P (Y / X ) 


The Bayes law allows us to write the posterior probability P(Y / X ) as follows: 
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P(x/Y)P(Y) 


P(x) io 





P(Y/X)= 


Since P( X ) is aconstant term, so maximizing P(Y / xX) is equivalent to maximizing P (x /Y ) P (Y ) 


AUGINGE ey (P(x / x)) = OO a (P(x i Y)P(Y)) (7) 





(b) (c) 


Figure 2. Image result of KDE method using a fixed threshold. (a) Original image, (b) Image result obtained 
using P(x)>threshold= 0.8, (c) Image result obtained using P(x)>threshold= 0.2. 


Assume that X is conditionally independent given Y, then: 


P(X/¥)=11P(x,/¥,) (8) 
Where 
P(x, / y, ) =exp( De” (y,)) (9) 


Dp ( yp] is the data energy, it tests how the current labeling y, 1s coherent with the observed data. 


Using (8) and (9) we can write P(X/Y) as follows: 


P(X /Y¥)=ew-ED""(y,)) (10) 
Es 


Whereas P(Y) is a prior probability calculated by Hammersley-Clifford theorem that simulates Markov 
Random Field using Gibbs distribution with a four-neighborhood system as follows: 


P(Y) =e (-EV 4695 | (11) 
P»4 
Vg( ¥p,¥_ ) denotes the smooth energy, it penalizes two neighboring pixels p and q when the label 


y, and y, are too different. 


Using (10) and (11) we can write: 
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N 
argmax, .yP(Y / Xx) =argmax, | exp( = (y, — Vial y,¥,)) (12) 
P»q 


p=l 


So finally, maximizing the posterior probability P(Y y Xx) 1s equivalent to minimizing the energy function E 


represented as follows: 


E(y)=>D""(y, )+ DV, 09,94) (13) 


pel 
Where, the minimization of the energy function E is equivalent to find a minimum cut in a graph G = (V; E’). 


3.2.2 Moving Object Detection by Minimization of Energy Function 

Our aim is to segment our probability image into background/moving object, this segmentation is 
estimated as a global minimum of the energy function E (13) computed by a standard minimum cut 
algorithm[24]-[25], as demonstrate in Figure 3. 

Firstly, we create a graph G = (V;E’) for each probability image, where V stands for the set of nodes 
(pixels), and two terminal nodes connected to every pixels, named S(Source) and T(Sink), E’ represents the 
set of edges connecting two adjacent nodes with weight V (y y,).(pqeN), 


and connecting S and T to every nodes p in I with weight pésn ( y,) and por ( yp) respectively. 


Orginal mage 


ty 
= 
= 
= 
— 
I 
—_ 





Figure 3. Image segmentation using Graph Cut [24] 


Therefore, the binary labeling problem is to assign each node in V a unique label y,, where y, in 
{0,1}. Secondly, Once the graph is built, Boykov and Jollly[24] solve the labeling of the pixels through a 
cut on the graph as shown in Figure 3, this cut will sever two types of links: 

a)  T-links: a cut removes one of the two edges that connect a pixel with a terminal S or T node, associating 
it to the object or background class. 
b) N-links: a cut removes the links between pairs of pixels associated to different terminals. 

A cut C in graph is a subset of edges which separates the nodes into two parts; one part belongs to 
the source and the other belongs to the sink. The cost of a cut is the summation of the weights of its edges. 
The minimum cut of the graph will generate an optimal segmentation in the image. 

In order to calculate the energy function BE, we calculate p” (y,) and vy, (y, y, )as follows: 


_ (Xp ~HBY 


DpBmisround (y, ) = 20p? (14) 


Pp 
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_(OXp-Honj? 
De"'(y,)=e (15) 
1 Op —Xq : (16) 
—_ 202 
Va (¥,. Y,) 7 dist( p,q) “ 


Where 

Us: Represents the peak of background distribution histogram represented in Figure 4(b)). 
Hop;: Represents the peak of objects distribution histogram represented in Figure 4(b)). 

Op and Gonj: Represents the variance of background and objects distributions respectively. 
Xp, Xq: Pixels values in probability image 

Dist(p,q): Euclidian distance between pixels p and gq. 

As shown in Figure 4(b), the histogram of probability image is bimodal. t,,; 1s the mean of the first 
distribution, and ug is the mean of the second distribution. In general, the background pixels is more 
dominant than object pixels, and the probability value of the background is higher than this of the object, 
So: UB > Hob: 

Mobj, Up and Govj, Op are calculated using K means algorithm applied in probability image, with the 
number of cluster is set to 2. 











(a) (b) 


Figure 4. (a) Probability image, (b) Histogram of probability image 


4. RESULTS AND ANALYSIS 

In order to validate the proposed technique using qualitative evaluation, three different types of 
scenes were considered from the CDnet2014 video dataset, which are acquired using a fixed camera. 
The first one is a highway from Baseline sequences which represents a road containing vehicles with non 
stationary background due to moving trees in left of the image; the size of frame is 320x240. And the second 
1s a pedestrians from baseline sequences, it 1s appropriate for testing the effect of illumination change, 
the size of the frame is 360x240. Whilst, the third is the fountain! from Dynamic background sequences, 
which represents a scenario of dynamic background due to the presence of moving trees and fountain; 
the size of the frame is 432x288. 

To verify the performance of the proposed method, we compare our results to those obtained using 
Gaussian Mixture Model (GMM) [5] and Kernel Density Estimator method (KDE) [10]. The qualitative 
results are illustrated in Figure 5, where the origin frames of the video sequences are shown in the first 
column, the results obtained using Gaussian Mixture Model are displayed in the second column, and those of 
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KDE method are shown in the third column, those acquired with the proposed method in the fourth column, 
and the fifth column represents the ground truth images. 

The implementation of the proposed and the comparison methods is developed using Opencv 2.4.9 
in C++ environment, where the GMM is tested by using the number of Gaussians=3, the number of historic 
frames=100 and learning rate a=0.2. Wheras KDE method is implemented by using the number frames in 
buffer=100, learning rate o=0.2 and threshold=0.6.= 

As illutrated in Figure 5, the results of the first sequence indicate that all methods successfully 
detected the foreground, which may be explained by the fact that the sequence is simple. However, 
the proposed method shows more effectiveness in suppression the false positive and negative pixels. In the 
second sequence, each method presents a degree of success to cope with uninteresting movements of the 
background due to moving trees and fountain. However the results of the GMM and KDE contain not only 
missed foreground detection but also false foreground detection. But the use of KDE and Graph cut has 
improved the quality of detection, especially, the correction of misclassified foreground and background 
pixels. Whereas the results of the third sequence show that our method has effectively adapted to the 
illumination change of the scene. 

Whereas in the quantitative analysis, three different performance metrics, Precision, Recall and F- 
Measure have been tested, and are defined as follows: 








_ TP Number of true positives detected 
Precision = = 
TP + FP Number of positives detected 
TP Number of true positives detected 
Recall = 





TP +FN _ Total number of true positives in ground truth 


2 x Precision x Re call 
F — Measure = 





Precision + Recall 


Where TP denotes true positives, FP represents false positives and FN false negatives. 

Among these metrics, we are interested especially in the F-mesure, which are commonly accepted as 
good indicators of the overall performance of the background subtraction methods. By definition, a good 
algorithm is the one that produces a small number of false positives, false negatives, and high 
F-mesure score. 

The average recall, average precision and average F-mesure values of each method are calculated 
using all baseline and dynamic background sequences and illustrated in Table 1, which reveals that the 
proposed scheme produces better F-mesure score compared to the others approaches, either in the baseline 
and dynamic background sequences, except KDE, which presents a better score in the baseline, but our 
method perfoms well in the dynamic background sequence. 


Table 1. The Average Metrics Score of the Proposed and other Methods, Note that These Results Obtained 
from CDnet 2014 Challenge Website 


Sequences aes GMM[5] KDE[10] ee a DCB [15] ne 
Precision 0.8461 0.9223 0.8870 0.9070 0.9011 
Baseline Recall 0.8180 0.8969 0.8137 0.7123 0.8465 
F-Measure 0.8245 0.9092 0.8450 0.7695 0.8729 
ee Precision 0.5989 0.5732 0.5515 0.7632 0.6021 
ne nee Recall 0.8344 0.8012 0.7392 0.5803 0.6815 
F-Measure 0.6330 0.5961 0.5953 0.6149 0.6393 


As demonstrated above, the combination of KDE and Graph Cut methods improves the quality of 
results as shown in comparative results, because, it exploits the correlation that exists between the intensities 
of neighboring pixels, which achieve higher detection accuracy in the presence of dynamic background. 
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Original Image KDE Method Proposed Method Ground Truth 


Frame 720 Frame 925 Frame 584 Frame 350 Frame 1125 Frame 82 Frame 69 


Frame 1150 





Figure 5. Results of the proposed, GMM and KDE methods 
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5. CONCLUSION 

The purpose of the current paper is to detect moving objects from the background, by tackling two 
issues, exploiting the correlation between neighboring pixels and avoid the use a global threshold. So firstly 
we generate a probability image using Kernel Density Estimation method, secondly, instead of threshold the 
probability image to extract moving pixels, this task is achieved by using the graph cut algorithm, which 
minimize an energy function, that is equivalent to classifying each pixel into the background or the 
foreground. This approach has been tested in CDnet2014 video dataset, and the obtained results demonstrate 
its effectiveness in foreground detection with less memory requirement and time consuming, compared to the 
others methods. 
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